TerrorCat: a Translation Error Categorization-based MT Quality Metric
نویسندگان
چکیده
We present TerrorCat, a submission to the WMT’12 metrics shared task. TerrorCat uses frequencies of automatically obtained translation error categories as base for pairwise comparison of translation hypotheses, which is in turn used to generate a score for every translation. The metric shows high overall correlation with human judgements on the system level and more modest results on the level of individual sentences. Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-63325 Published Version Originally published at: Fishel, Mark; Sennrich, Rico; Popović, Maja; Bojar, Ondřej (2012). TerrorCat: a translation error categorization-based MT quality metric. In: NAACL 2012 7th workshop on Statistical Machine Translation, Montreal, Canada, 7 June 2012 8 June 2012, 64-70.
منابع مشابه
Ranking Translations using Error Analysis and Quality Estimation
We describe TerrorCat, a submission to this year’s metrics shared task. It is a machine learning-based metric that is trained on manual ranking data from WMT shared tasks 2008–2012. Input features are generated by applying automatic translation error analysis to the translation hypotheses and calculating the error category frequency differences. We additionally experiment with adding quality es...
متن کاملWhen Multiwords Go Bad in Machine Translation
This paper addresses the impact of multiword translation errors in machine translation (MT). We have analysed translations of multiwords in the OpenLogos rule-based system (RBMT) and in the Google Translate statistical system (SMT) for the English-French, English-Italian, and English-Portuguese language pairs. Our study shows that, for distinct reasons, multiwords remain a problematic area for ...
متن کاملFeasibility of Minimum Error Rate Training with a Human- Based Automatic Evaluation Metric
Minimum error rate training (MERT) involves choosing parameter values for a machine translation (MT) system that maximize performance on a tuning set as measured by an automatic evaluation metric, such as BLEU. The method is best when the system will eventually be evaluated using the same metric, but in reality, most MT evaluations have a human-based component. Although performing MERT with a h...
متن کاملAutomatic Improvement of Machine Translation Systems
Achieving high translation quality remains the most daunting challenge Machine Translation (MT) systems currently face. Researchers have explored a variety of methods for including translator feedback in the MT loop. However, most MT systems have failed to incorporate post-editing efforts beyond the addition of corrected translations to the parallel training data for Example-Based and Statistic...
متن کاملMeta-Evaluation of a Diagnostic Quality Metric for Machine Translation
Diagnostic evaluation of machine translation (MT) is an approach to evaluation that provides finer-grained information compared to state-of-the-art automatic metrics. This paper evaluates DELiC4MT, a diagnostic metric that assesses the performance of MT systems on user-defined linguistic phenomena. We present the results obtained using this diagnostic metric when evaluating three MT systems tha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012